Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Mar 6, 2017

Hi, this is the first implementation. So we can have

let str = {js|你的名字|js};;

let x_1 = "world";;

let x_2 = {js| Bucklescript by 彭博 |js};;

let es6 = {j|hello $x_1,欢迎来到 $(x_2)|j};;

let es62 = {j|$str, 君の名は|j}

let a = "a";;

let b = "b";;

let c = a ^ b;;

let d = (^) a b;;

let c = Js.Json.stringify str;;
let () = Js.log str;;

compiled to

// Generated by BUCKLESCRIPT VERSION 1.5.1+dev, PLEASE EDIT WITH CARE
'use strict';


var str = "你的名字";

var x_1 = "world";

var x_2 = " Bucklescript by 彭博 ";

var es6 = "" + "hello " + JSON.stringify(x_1) + ",欢迎来到 " + JSON.stringify(x_2);

var es62 = "" + JSON.stringify(str) + ", 君の名は";

var a = "a";

var b = "b";

var d = "ab";

var c = JSON.stringify(str);

console.log(str);

exports.str  = str;
exports.x_1  = x_1;
exports.x_2  = x_2;
exports.es6  = es6;
exports.es62 = es62;
exports.a    = a;
exports.b    = b;
exports.d    = d;
exports.c    = c;
/* es6 Not a pure module */

From what I can see here there are a few issues I will be working on to improve this:

  1. Do not use JSON.stringify to convert any object into a string. This should be done by other means, but my JS knowledge is limited here. Any good suggestions?

  2. More unit testing on how we report error for certain location

  3. To use regex in OCaml I added the dependency on the standard library's Str module, do you have an opinion on this? It's named Ext_str.
    It depends on a C file, which I also pulled in, but if you use clang to compile it, it will raise some warnings. I will look into removing them later.

Anyway, any other suggestions are very welcomed!

@glennsl
Copy link
Contributor

glennsl commented Mar 7, 2017

This is awesome @dorafmon. Here's my opinion:

  1. It probably shouldn't do automatic string conversion, it would be more idiomatic and a bit safer not to. But the right way to do it is to either use the String type constructor (available as Js.String.make) or the toString method that is available on every js object (but not currently exposed via FFI I think), though the latter puts you at risk of exceptions if you try to call the method on a null or undefined value.

  2. The use of regexes here seems simple enough that it might not be worth it to pull in an extra dependency. But if you need a regex library, this is a pure OCaml implementation that looks very, very good: https://github.com/ocaml/ocaml-re

(** Compile a regular expression. The syntax for regular expressions
is the same as in Gnu Emacs. The special characters are
[$^.*+?[]]. The following constructs are recognized:
- [. ] matches any character except newline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str comes with ocaml core distribution, so you don't need copy here. But I would suggest not using it, since you only need one function which can be replaced with a string function easily

val test : 'a -> 'b kind -> bool

external parse : string -> t = "JSON.parse" [@@bs.val]
external stringify: 'a -> string = "JSON.stringify" [@@bs.val]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about adding on function in Js.String

external ofAny : 'a -> t = "String" [@@bs.val]

Copy link
Contributor

@glennsl glennsl Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already Js.String.make, which does exactly that

let rec print_string_list ss = match ss with
| [] -> ()
| (hd::tl) -> let _ = print_endline hd in print_string_list tl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List.iter print_endline


let es62 = {j|$str, 君の名は|j}

let a = "a";;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need add a test case here:

let a = {j| blabla \$(xx) |j} (* should not be interpolated*)
let b = {j| blabla \$xxx |j} (* should not be interpolated *)

} in _transform_individual_expression rexp new_loc (new_exp::nl))

let transform_es6_style_template_string s loc =
let sub_strs = Ext_str.full_split template_string_regex s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a Ext_string.split, but you need verify it is utf8 first

| [] -> prev
| (e::re) ->
let string_concat_exp:Parsetree.expression = {e with pexp_desc = Parsetree.Pexp_ident {txt = Longident.Lident ("^"); loc = e.pexp_loc}} in
let new_string_exp = {e with pexp_desc = Parsetree.Pexp_apply (string_concat_exp, [("", prev); ("", e)])} in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To play safe, qualified it as Pervasives.(^) instead of (^)

@bobzhang
Copy link
Member

bobzhang commented Mar 7, 2017

The CI error means you need link str.cmxa (for c stubs), but I think here you don't need regex though


external parse : string -> t = "JSON.parse" [@@bs.val]
(* TODO: more docs when parse error happens or stringify non-stringfy value *)
external stringify: 'a -> string = "JSON.stringify" [@@bs.val]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be t or 'a?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I don't need this anymore. Will move this change to another PR later.

@ghost
Copy link
Author

ghost commented Mar 8, 2017

@bobzhang Hi, I will remove the dependency on Ext_str but is there any reason why we cannot have the Str module in the compiler? Thanks!

@ghost
Copy link
Author

ghost commented Mar 8, 2017

@glennsl Hi thanks I will use Js.string.make then.

@ghost
Copy link
Author

ghost commented Mar 8, 2017

@bobzhang never mind I realized we probably do not want unnecessary dependencies anyway. I will replace it with a function.

@bobzhang
Copy link
Member

@dorafmon let me know when it is good for review ; )

@ghost
Copy link
Author

ghost commented Mar 13, 2017

@bobzhang sure, I am missing out some unit tests, that's why I did not ask for a review. Will work on this and let you know. 😄

@ghost
Copy link
Author

ghost commented Mar 14, 2017

@bobzhang hi I think it's ready for a review. Thanks!

Copy link
Member

@bobzhang bobzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you use ocp-indent for indenting, I would suggest have it set up, it is a nice tool




let append s c = s ^ String.make 1 c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name is confusing. append_char?

end;
__LOC__ >:: begin fun _ ->
let s, new_index = Ast_utf8_string.consume_text "Hello \\$world" 0 in
let _ = s =~ "Hello \\$world" in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a few more edge cases?

"\$"
"\$x"
"\\$x"
"\\$" 
"\\\$"
"\\\$x"
"\\\\$x"

Copy link
Author

@ghost ghost Mar 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused here, what is the expected behavior? Note here "Hello \\$world" in fact equals to {j|Hello \$world|j} since the OCaml parser will do some processing and escape the \\$ to \$. Thanks.

if index = String.length s then new_word, String.length s
else begin
match s.[index] with
| '$' -> if last_char = '\\' then _consume_text s (index+1) '$' (Ext_string.append new_word '$')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is incorrect, see my edge cases above

| '(' -> (if !with_par = false then (with_par := true; _consume_delim s (index+1) ident) else (None, index))
| ')' -> (if !with_par = false then (None, index + 1) else (with_par := false; (Some ident, index+1)))
| '$' -> (_consume_delim s (index+1) ident)
| c -> if (Char.code c >= Char.code 'a' && Char.code c <= Char.code 'z') ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ocaml support char range patterns

| 'a' ... 'z' | 'A' .. 'Z' | '0' .. '9'| '_'

} in _transform_individual_expression rexp new_loc (new_exp::nl))
| Delim p -> (let new_loc = compute_new_loc loc p in
let ident = Parsetree.Pexp_ident { txt = (Longident.Lident p); loc = loc } in
let js_to_string = Parsetree.Pexp_ident { txt =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you factor out it into a small function?

| [] -> prev
| (e::re) ->
let string_concat_exp:Parsetree.expression = {e with pexp_desc = Parsetree.Pexp_ident
{txt = Longident.Ldot (Longident.Lident ("Pervasives"), "^"); loc = e.pexp_loc}} in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Location.raise_errorf ~loc "{j||j} is reserved for future use"
else if Ext_string.equal delim Literals.unescaped_j_delimiter then
let starting_loc = {loc with loc_end = loc.loc_start} in
let empty_string_concat_exp = {e with pexp_desc = Pexp_constant (Const_string ("", None)); pexp_loc = starting_loc} in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of empty_string_concat_exp?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to get rid of it? I think it makes the implementation much simpler if we keep it.

if index >= String.length s then List.rev nl
else begin
match consume_text s index, consume_delim s index with
| ("" , str_index) , (None , err_index) -> let new_loc = error_reporting_loc loc index err_index in Location.raise_errorf ~loc:new_loc "Not a valid es6 template string"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of raising directly, can we return an optional for easy unit-testing?
The current unit testing depends on compiler-libs, it would be nice that such utility is self-contained

@bobzhang
Copy link
Member

bobzhang commented Mar 15, 2017 via email

@ghost
Copy link
Author

ghost commented Mar 15, 2017

@bobzhang sorry I don't quite get what you mean here, functions like consume_delim/consume_text which are tested in the OUnit unit tests work on the strings we extracted from {j|...|j}. I guess I can add some BS tests in the test folder to directly test the template strings.

@bobzhang
Copy link
Member

@dorafmon it seems your code does not work on {j|\\$x|j}, yes,it is nice to always use {||} in testing when you are sensitive about escaping rules

@ghost
Copy link
Author

ghost commented Mar 15, 2017

It will be compiled to var escape0 = "" + "\\$x"; which is correct right? Since \$ is escaped and we should just keep both \.

@bobzhang
Copy link
Member

it should be "\" + x

@ghost
Copy link
Author

ghost commented Mar 16, 2017

Just to clarify here, we only need to escape \\ ad \ and \$ as $ right. We don't need to implement the full escaping rules as specified by https://caml.inria.fr/pub/docs/manual-ocaml/lex.html

Copy link
Member

@bobzhang bobzhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the lexical convention will follow js style, would you take a look at how we do utf8 decoding https://github.com/bloomberg/bucklescript/blob/master/jscomp/syntax/ast_utf8_string.ml#L26

I think you need to that in the first pass, other the location would be wrong, for example
{j|你好$x|j} the offset of x would be 6 if you don't do utf8 decoding

else (new_word, index)
| c -> _consume_text s (index + 1) c (Ext_string.append new_word c)
| '\\' -> (if index + 1 = String.length s then "", index else
match s.[index+1] with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may get out of bound

Char.code c = Char.code '_'
then _consume_delim s (index+1) (Ext_string.append ident c)
else if !with_par = false then (Some ident, index) else (None, index + 1)
| 'a' .. 'z' | 'A' .. 'Z' | '0' .. '9'| '_' ->_consume_delim s (index+1) (Ext_string.append_char ident s.[index])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@ghost
Copy link
Author

ghost commented Apr 1, 2017

@bobzhang Hi, I rebased my changes to master and have fixed some small issues according to the code review (do a utf8 decoding and OCaml lexical escape first so we get the location correct). As I understand that we know have a different error reporting rather than Location.raise_error? Could you point me to that so I can change my code to use that? Overall this is well tested already, let's fix this and ship it.

Format.pp_print_string ppf if_highlight
else begin
fprintf ppf "%a%a %s" print loc print_error_prefix () msg;
List.iter (Format.fprintf ppf "@\n@[<2>%a@]" default_error_reporter) sub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, would you update your ocaml submodule? I cherry picked a fix from upstream, thanks

let starting_loc = {loc with loc_end = loc.loc_start} in
let empty_string_concat_exp = {e with pexp_desc = Pexp_constant (Const_string ("", None)); pexp_loc = starting_loc} in
let exps_list = Ast_utf8_string.transform_es6_style_template_string s starting_loc in
Ast_utf8_string.fold_expression_list_with_string_concat empty_string_concat_exp exps_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just export one function in Ast_utf8_string, so it looks simpler here

@@ -1,5 +1,5 @@
(* Copyright (C) 2015-2016 Bloomberg Finance L.P.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create an interface file for this module, sorry for my laziness

| '$' -> _consume_text s (index+2) ' ' (Ext_string.append_char new_word '$')
| c -> _consume_text s (index+1) '\\' (Ext_string.append_char new_word '\\'))
| '$' -> (new_word, index)
| c -> _consume_text s (index + 1) c (Ext_string.append_char new_word c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend_string.append_char is highly in-efficient. you should record offset, instead of creating new string each time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there is no utf8 involvement when consuming(decoding)

if index = String.length s then (if !with_par = true then (None, index) else (Some ident, index))
else
match s.[index] with
| '(' -> (if !with_par = false then (with_par := true; _consume_delim s (index+1) ident) else (None, index))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unclear about such piece of code, our interpolation is quite simple(no nested interpolation).
it is only $x or $(x)

let error_reporting_loc (loc:Location.t) start_index end_index =
let new_loc =
{loc with loc_start = {loc.loc_start with pos_cnum = loc.loc_start.pos_cnum + start_index};
loc_end = {loc.loc_end with pos_cnum = loc.loc_start.pos_cnum + end_index }} in new_loc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should record offset, instead of creating new loc each time, otherwise, it would do lots of allocation

@ghost ghost closed this Apr 9, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants